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Abstract 

Multivariate geostatistics is based on modelling all covariances between all possible com¬ 
binations of two or more variables at any sets of locations in a continuously indexed domain. 
Multivariate spatial covariance models need to be built with care, since any covariance matrix 
that is derived from such a model must be nonnegative-definite. In this article, we develop a 
conditional approach for spatial-model construction whose validity conditions are easy to check. 
We start with bivariate spatial covariance models and go on to demonstrate the approach’s con¬ 
nection to multivariate models defined by networks of spatial variables. In some circumstances, 
such as modelling respiratory illness conditional on air pollution, the direction of conditional 
dependence is clear. When it is not, the two directional models can be compared. More gen¬ 
erally, the graph structure of the network reduces the number of possible models to compare. 
Model selection then amounts to finding possible causative links in the network. We demon¬ 
strate our conditional approach on surface temperature and pressure data, where the role of the 
two variables is seen to be asymmetric. 


1 Introduction 

The conditional approach to building multivariate spatial covariance models was introduced by|Royl^ 


et al. (19991. In that article, pressure and wind fields are modelled as a bivariate process over a 


region of the globe, with the wind process conditioned on the pressure process through a physically- 
motivated stochastic partial differential equation. In general, such models exhibit asymmetry; that 
is, for yi(-) and V 2 (’) defined on d-dimensional Euclidean space 

cov{Yi(s), V 2 (m)} a cov{l 2 (s), Vi(u)}, s,u G 


Of course, it is always true that cov{Vi(s), V 2 (w)} = cov{V 2 (u), Yi(s)}. 

There are commonly-used classes of multivariate spatial models that assume symmetric, sta¬ 
tionary dependence in the cross-covariances; that is, they assume Ci 2 {h) = cov{Yi(s), ¥ 2(3 + h)} = 
cov{l 2 (s), 11(5 + d)} = C' 2 i(d), for d G (e.g., Gelfand et al. 2004 Cressie & Wikle 2011 Section 
4.1.5; Genton & Kleiber 20151. The most notable of these symmetric-cross-covariance models is 
the linear model of coregionalization; see, for example, Journel & Huijbregts (1978 Section III.B.3), 
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Webster et al. (1994), Wackernagel (1995), and Banerjee et al. (2015 Section 9.5). While symmetry 


may reduce the number of parameters or allow fast computations, it may not be supported by the 
underlying science or by the data. 


Ver Hoef & Cressie (1993) avoid making symmetry restrictions by working with variance-based 


cross-variograms and propose a spatial shift parameter to express asymmetry. |Genton fc Kleiber] 


(2015) review other approaches that capture asymmetry and include those of Apanasovich & Genton 


(2010) and Li & Zhang (2011); see also Ghristensen & Amemiya (2001). In multivariate spatial- 
lattice modelling, Sain & Gressie (2007), Sain et al. (2011), and Martinez-Beneito (2013) specifically 
include asymmetry in their models. 

A key outcome of multivariate geostatistics is optimal spatial prediction of a hidden multivari¬ 
ate spatial process, F(-) = {Yi(-),..., y),(-)}^, based on multivariate noisy spatial observations, 
{Z,{ Sqi) : i = 1,... ,TOq, q = 1,... ,p}, of the hidden processes {Yq{-) : q = 1,... ,p\. Assuming 
additive measurement error, £q{-), we have data Zq{-) = Yq(-) + £q{-) at the niq data locations, 
= {sqi : i = 1,... jiriq}, for q = 1,... ,p. Notice that we have not assumed colocated data for 
the different spatial variables. Optimally predicting just one of the processes, say yi(-), using the 
multivariate data {Zq{sqi)}, is often called cokriging. 

Gontributions to multivariate-spatial-prediction methodology include those of Myers (1982,1992), 


Ver Hoef & Gressie 

1993), 

Wackernagel 

(1995), 

Gressie & Wikle 

(1998 

, Royle & Berliner 

(1999), 

Gelfand et al. 

(2004) 

, Majumdar & Gelfand 

(2007 

), Finley et al. ( 

2008), I 

luang et al. 

(2009 

), Cressie 


& Wikle (2011 Section 4.1.5), Furrer & Genton (2011), Heaton & Gelfand (2011), and Banerjee 


et al. (2015, Chapter 7). 


Genton & Kleiber (2015) give a comprehensive review of many different ways that valid mul¬ 


tivariate covariances can be constructed, with a brief mention of the conditional approach. For 
spatial-lattice data, Kim et al. (2001) and Jin et al. (2005) use a conditional approach to modelling 
multivariate spatial dependence. For regularly or irregularly gridded spatial processes, [Cressie ^ 


Wikle (2011 p. 234) clarify the discussion of the conditional approach given in Gelfand et al. (2004). 


For geostatistical data, Heaton & Gelfand (2011) build a multivariate model for predicting Z 2 from 
Zi by conditioning on Zi and a kernel-smoothed Zi. In this article, we show that |Royle et al. (1999) 
and Heaton & Gelfand (2011) describe specific cases of a large class of multivariate models whose 
existence we establish. 


2 Modelling joint dependence through conditioning 

In this section, we introduce the conditional approach by considering the bivariate case. Here, 
{(yi(s), 1 ^ 2 ( 5 )) : s £ D C are two co-varying spatial processes in a continuous-spatially-indexed 
domain D of positive volume contained in d-dimensional Euclidean space the multivariate case is 
considered in Section]^ As was seen in Sectionit is sometimes convenient to write the individual 
processes as yi(-) and F 2 (-), respectively. Then the joint probability measure of Yi(-) and Y 2 {-) can 
be written as, 

[^(•),i^2(-)] = r2(-)|n(-)][n(-)], (1) 

where we use the convention that [A \ B] represents the conditional probability of A given B, and 
[B] represents the marginal probability of B. The conditional probability in ([^ is shorthand for 
[{ 12 ( 5 ) : s € iJ}|{yi(?;) : v € U}], which we see in Sectionj^is defined through the finite-dimensional 
distributions. In this article, we are particularly interested in the conditional distributional properties 
of F 2 (s) and of {Y 2 (s),Y 2 (u)}, given {Fi(z)) : v G Bj. 

The order of the variables is a choice, but it is generally driven by the underlying science; for 
example, yi(-) might be ambient ozone in a city and Y 2 {-) might represent the spatial intensity 
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or density of respiratory illness in the city; or Yi(-) might be a temperature field and i 2 (-) might 
be a rainfall field, where i 2 (-) depends to some extent on Fi(-) through evapo-transpiration and 


the Penman-Monteith equation (e.g., Beven, 1979). When the order is not obvious, both models 
can be fitted and the best one selected, indicating discovery of a possible causative link. For the 
multivariate case in Section]^ it is enough to have a partial order on the variables or, equivalently, 
a directed acyclic graph ( |Cressie &: Davidson 1998). 

Assume that £'{yi(-)} = 0 = E{Y 2 {-)}] we relax this in Section]^ Consider the following model 
for the first two conditional moments of [{^ 2 ( 3 ) : s G -Dj | yi(-)]; 


E{V 2 (s) I Yi(-)} = [ b{s,v)Yi{v)dv, sGD, 
Jd 

COv{r2(s),y2(M) I hl(-)} =C2\l{s,u), S,UGD, 


( 2 ) 

(3) 


where &(•, •) is any integrable function that maps from x into K, and C' 2 |i(-, •) is a univariate 
covariance function that does not depend functionally on Fi(-). In ([^, 6(-, •) may be obtained from 
scientific understanding of how Y 2 {-) evolves from {yi(u) : v € -Dj. Hence, we call b an interaction 
function, and it has an important role in scientific modelling of positive or negative dependence of 
Y 2 on Yi. Recall from Section that Yq is observed with measurement error, resulting in Zg, for 
q = 1,2. Unlike in Royle et ( |1999 1 and Heaton & Gelfand (2011), the focus of ([^ and ([^ is on 
the latent processes Yi and Y 2 , rather than on Zi and Z 2 . Important special cases of ([^ include 
b{s, v) proportional to a kernel smoothing function and 5(s, v) proportional to a Dirac delta function, 
which describes pointwise dependence. 

Critically, the conditional covariance function C 2 \i in is necessarily a nonnegative-definite 


function, and there are many classes of such functions available (e.g., Christakos 1984 Cressie, 1993 


Section 2.5; Banerjee et al.[ 2004 Section 2.2). Finally, suppose that Yi(-) has a valid univariate 


covariance function which is also necessarily nonnegative-dehnite. Thus, the conditional 

approach requires only specification of an integrable interaction function and two valid univariate 
spatial covariance functions, C 211 and Cn, leading to rich classes of cross-covariance functions. 


Section [33] gives one such class. 

Define C, 

covariance models, C 211 and Cn, we have: 


qr{s, u) = cov{yq(s), Yr(u)}, foi' q,r = 1,2 and s,u G D. From the two univariate spatial 


(722(5, U) = COv{y2(s), Y2 (m)} 

= cov[U{Y2(s) I Y,{-)},E{Y2 {u) I Y,i-)}]+E[coY{Y2is),Y2iu) \ Yi(.)}] 

= / / b{s,v)Cii{v,w)b{u,w) dvdw + C 2 \i{s,u), s,uGD. 

J D J D 


(4) 


When M = s in one can see that var{Y 2 (s)} can be expressed as a decomposition of spatial 
variation due to its regression on Yi(-) plus the remaining variation, (72|i(s, s), unexplained by Y 2 ’s 
dependence on Yi. In general, ([^ shows a decomposition of spatial covariation into an explanatory 
component and a descriptive component. 

Importantly, the formulas for the cross-covariances are straightforward: 

Ci2(s,m) = cov[Fi(s),F;{y2(M) I Ti(-)}] = [ Cii{s,w)b{u,w)dw, s,uGD, ( 5 ) 

Jd 

which has only an explanatory component. The other cross-covariance is obtained from 

(72i(s,u) = (7i2(m, s), s,uGD. (6) 
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Finally, recall that 

<711(5, u) = cov{Yi(s),yi(M)}, s,u€D, ( 7 ) 

is a given nonnegative-definite function, and this is descriptive only of spatial covariation in Yi. 

Then Q-Q specifies all covariances {Cqr{‘, Oli any covariance matrix obtained from them 
will be nonnegative-dehnite; see Sectionj^ From ([^, Ci 2 {u, s) = <7ii(it, w)b{s, w)dw ^ <7i2(s, u), 
in general, because 6(-, •) may be asymmetric. That is, the conditional approach captures asymmetry 
naturally through the interaction function. 


3 Bivariate stochastic processes based on conditioning 


3.1 Existence of a bivariate stochastic process 

Let [{Fi°(s), F2 °(s)} : s e R'^] be a bivariate Gaussian process with mean 0 , covariance functions 
( 7 °;^(-, •), < 722 (-, •)) cross-covariance functions < 7 ° 2 (-, •)) ')■ Then for any pair of nonnegative 

integers ni,n2 such that ni -|- 712 > 0 ; for any locations {sifc : fc = 1 ,..., ni}, {52; : ^ = 1 ,..., 712}) 
and for any real numbers {aik : fc = 1 ,..., tii}, {02; : Z = 1 ,..., 772}, 


var {sik) + ^ 0 .2iy2 {^ 21 ) I” 

[k^l 1^1 ) 

ni ni 712 ti2 

= EE + EE (^2l(^2l'C22i^2h S 2 I') 

k^l k'^1 1^1 

ni 712 712 Til 

+ EE aika2i'Ci2{sik, S 21 ’) -b EE a2iaik'C2iis2i, sik') > 0. 


( 8 ) 


fc=i i'=i 


/=i k'=i 


Conversely, suppose that the set of functions, {Cqr{-,-) ■ = li2}, has the property that 

Ci 2 {s,u) = C 2 i{u,s), for all s,u G and that Q holds. Then there exists a bivariate Gaussian 
process {(yi(s), 12 ( 5 )) : s G such that 

cov{y'q(s), 17 (w)} = Cqr{s, u), s,u G q,r = 1, 2. 


The proof of this result relies on establishing the Kolomogorov consistency conditions (e.g., Billings¬ 


ley 1995 pp. 482-484) for the finite-dimensional distributions of 

{Ti(sil), . ■ . , Fl(si„J, 12 ( 521 ), . . . , T2(52„2)}. 


They are specihed to be Gaussian with second-order moments defined by 0-0. The consistency 
conditions are: the finite-dimensional distributions are consistent over marginalization; and permu¬ 
tation of the variables’ indices does not change the probabilities of events, which we now establish. 

Consider {Cqr(-, •)} defined by 0-0. Because the finite-dimensional distributions are Gaussian, 
permutation-invariance is guaranteed by 0, an expression for covariances. The right-hand side of 
0 consists of (72 |i(-, •)? which is nonnegative-dehnite, added to a quadratic term that is guaranteed 
to be nonnegative-dehnite, since <7ii(-, •) in 0 is nonnegative-dehnite. Hence, <722(-, •)> which is the 
sum of these two terms, is nonnegative-dehnite. Thus, marginally, 12(0 Tas a nonnegative-dehnite 
covariance function, but this is not enough. It remains to establish 0. Substitute 0 and 0 into 
the left-hand side of 0 to obtain 

712 712 

02(02/'<721 1 ( 52 ;, 52/') + 

1=1 l' = l 



/ / a(5)a(7i)(7ii(5, m) d5d7i, 

J D J D 


(9) 
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where for 6 {-) the Dirac delta function, 


ni 712 

a(s) = ^ au.( 5 (s - sife) + ^ a2ib{s2i,s), s G 

k^l 1^1 

Since C 2 \i and Cu are nonnegative-definite, 0 is nonnegative, resulting in (|^. 

Only nonnegative-definite functions for univariate processes are needed in the conditional ap¬ 
proach. Further, the finite-dimensional distribution, 

[{Yi{sik),Y 2 {s 2 i) : A: = 1,... ,ni; Z = 1,... ,n 2 }] 

= [{d^2(s2/) : Z = 1,... ,n2} I {yi(sife) : fc = 1,..., ni}][{Yi(sifc) : fc = 1,... ,ni}]. 


depends critically on the finite collection of interaction functions, {b{s 2 i, ■) : I — 1, ■.., 712 }. The only 
condition we place on 6(-, •) is that it is a real-valued integrable function. 

The existence proof given above shows that there is at least one process with covariance func¬ 
tions given by @-0- However, the modeller is not restricted to fitting bivariate Gaussian pro- 
Zammit-Mangion et al. (2015a I fit a non-Gaussian model constructively through ([^. 


cesses. 


In practice, geostatistical software will discretize the continuous spatial domain D onto a fine- 
resolution finite grid defined by the spatial lattice, = {si,..., s„}, which represents the centroids 
of the grid cells. That is, Yq{-) is replaced with the vector Yq = {¥^( 51 ),..., yq(s„)}^, q = 1,2. 
Under this discretization, @-0 become, respectively. 


cov(y2) — ^2|i T BYiiB^ 
coy(Yi,Y2) =SiiH^, 
cov(y2,Ui) 

cov(ri) =i;ii, 


( 10 ) 

( 11 ) 

( 12 ) 

(13) 


which were given by Cressie & Wikle (2011, p. 160) and were used by Jin et al. (2005) for modelling 


bivariate spatial-lattice data. In (10)-(13), S 2 I 1 and Sn are nonnegative-definite nx n covariance 
matrices obtained from {C' 2 |i(sfc, S;) : fc, Z = 1,..., n} and {Cii{sk, s;) : fc, Z = 1,..., n}, respectively, 
and B is the square n x n matrix obtained from {b{sk-, s/) : k,l = 1,..., n}. Hence, the following 
2n X 2n joint covariance matrix is nonnegative-definite: 


cov{(yl^y2")"} = 


Ell 

HSn 


YiiB^ 

E211 + BYiiB^ 


(14) 


Banerjee et al. ( 2015[ p. 273) state that it is meaningless to talk about the joint distribution of 


1^2(51) I lu(si) and 12(52) | 11(52) as building blocks for the conditional approach, with which we 
agree. They also go on to say that this “reveals the impossibility of conditioning,” with which we 
disagree. We have shown in this section that the conditional approach yields a well-defined bivariate 
Gaussian process {Fi(-), l2(-)}) since conditioning is on the whole process Yi(-). This implies a 
well-defined joint distribution of the random vectors Yi and I 2 J obtained from discretization, whose 
joint distribution is given by [li,l 2 ] = [T 2 | Ti][Fi], where [I 2 | li] is a A^(Hli,E 2 |i) density, and 
[Yi] is a N{ 0 , Ell) density. This relation is deceptively simple, but the existence proof above shows 
how such relations are founded in the joint probability measure of IJ)-) and l2(’)- 

The conditional density [1^2 I li] is derived from a linear regression of 1^2 on IJ, not on the ob¬ 
served variable Zi. The errors-in-variable model (Berkson 1950 Heaton & Gelfand 2011) considers 


a regression of noisy observations Z 2 on noisy observations Z \, which is different from the approach 
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we take. For our conditional approach, the conditioning is on the whole vector Yi, but any marginal 
or conditional finite-dimensional distribution can be easily derived. For example, [^ 2 ( 51 ) | Yi(si)] 
can be obtained from [yi(si), Y 2 (si)]/[Fi(si)], as follows. The numerator is 

[n(si),T2(si)] = [ ■■■ [ [Y2(si) I Fi][ri]dYi(s2)...dyi(s„), 

Jm. Jm. 


which from (14) is Gaussian with mean 0 and 2x2 covariance matrix. 


^ 11 ( 51 ,Si) 

ELi Cii{si,Sk)bik 


J 2 k=i Cii{si,Sk)bik 

^ 211 ( 51 , si) + X]fe=i bikCii{sk, si)bii 


where bik is the (i,fc)th element of B in (10)-(12), and the denominator is A^(0, < 711 ( 51 , Si)). 


We have seen above that it is not just one or a few finite-dimensional distributions that define 


the conditional approach, it is all of them. Banerjee et al. (2015, p. 273) state that the conditional 


approach is flawed and that kriging is not possible. In Section [3.2[ we give a simple, one-dimensional 
example of the conditional approach defined by 0-0 with kriging and cokriging equations for 
predicting {Yi(so) : Sq G from noisy incomplete data, { Zq { sqi ) : i = l,...,mq, q — 1,2}. We 
deliberately chose not to predict the dependent variable Y 2 to illustrate the flexibility of having a 


fully bivariate model. Zammit-Mangion et al. (2015b) show that the important scientific problem of 


predicting methane fluxes results in cokriging of this type. 

The incorporation of non-zero mean functions in {Yi(-): ^2(’)} i® straightforward. Let /ri(-) and 
M 2 (’) be real-valued functions defined on and suppose that the finite-dimensional Gaussian distri¬ 
butions obtained from {yi(sifc), ^ 2 ( 52 ;) : fc = 1,..., ni; Z = 1,..., 712 } have means {/ri(sifc), ijl 2 {s 2 i) '■ 
k = 1,..., ni; I = 1,..., 77 , 2 }, respectively. Then the method of proof at the beginning of this section 
yields a bivariate Gaussian process {yi(-),y2(’)} with mean functions {/7i(-),/i 2 (-)} and covariance 
functions {Cqr{-,-) : q,r = 1,2}. Covariates Xi(-) and X 2 {-) can then be incorporated through 
Hq{s) = Xq{sYI3q {s G D, q = 1,2), where /3i and /32 are vectors of regression coefficients of possibly 
different dimensions. 

3.2 Cokriging using covariances defined by the conditional approach 


Section 3.1 establishes the existence of the bivariate process {Yi(-)jb2(’)} with {Cqr{-,-)} given 


by 0-0, and hence we may use cokriging for multivariate spatial prediction in the presence of 
incomplete, noisy data. 


The aim of cokriging is to predict, say, Yi(so), sq S D, based on data and Z 2 (e.g., Cressie 


1993, p. 138), where 

Zq = { Zq { s ) : s e = { sq , : 7 = 1,..., 777,}, q=l,2- 


(15) 


Recall that Zq { sqi ) = Yq { sqi ) + eq { sqi ), E { eq {-)} = 0, and var{eq(-)} = (i = 1,.. . ,777,; q = 1,2). 
Then, the best predictor for yi(so) is the conditional mean, i?{Yi(so) | Yi, ^ 2 }- Assuming mean-zero 
joint Gaussian processes. 


Fi(so) = A{yi(so) I Zi,Z 2 } = 




" 

1_1 

^12 

1 - 


where for q,r = 1, 2, 


Cii + crYlmi C12 

C 21 C 22 + <xlYm 2 


Zi 

Z 2 


(16) 


Ir — {Gi7.(5o, Sri') ■ ^ — 1 ; • ■ ■ 5 777,.} , Cqr — \Cqr (Sqi, Srj) . 7 — 1 , . . . , Tflq , J — 1 , . . . , 777j.} , 
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and Irrig is the m, x niq identity matrix. Expression (161 is called the simple-cokriging predictor, 
and it is also the best linear predictor of yi(so). 

While in some multivariate models, the matrices {Cqr ■ = 1)2) are known in closed form 


(Genton & Kleiber 2015), this is not necessarily so here. Cokriging using the conditional approach 


may require integrations over D in order to compute {Cqr). There are examples where the integrals 
can be carried out analytically. One such example is given in Appendix 1. 

To demonstrate the benefits of cokriging based on a bivariate spatial model defined by the con¬ 
ditional approach, we simulated the processes Yi, Y 2 on a regular discretisation, , of D = [—1,1], 
where \D^\ = 200. We describe the covariations in C'ii(-,-) and C' 2 |i(-,-) by Matern covariance 
functions. 


Cii{s,u) = 


C' 2 |i(s, u) = 


'll 


2-ii-ir(z/ii) 


' 2|1 


2"^i^-W(i/2|i) 


{kii\u - Ky^^{Hiii\u - s|). 


{k 2 \i\u - K^^^^{k 2 \i\u - s\), 


(17) 

(18) 


where we set the variances to ah = = 0.2, the scale parameters to kh = 25,«;2|i = 75, the 

smoothness parameters to = j^ 2 |i = 1-5) and where is a Bessel function of the second kind of 
order i/. For the interaction function, we used the shifted bisquare function 


b(s,v) = 


dl{l-(|u-s-A|/r)2}^ 

0 , 


|u — s — A| < r, 
otherwise. 


(19) 


where we set the shift parameter to A = —0.3 to capture asymmetry, we set the aperture parameter 
to r = 0.3, and we set the scaling parameter to A = 5. The grid cells were used to define the 
discretized domain over which we carried out the numerical integrations in and (§. For example, 
Ci 2 (so,u) ~ I]fe=i ^fee'll(so, Wfc)6(it,rcfe), where = (wk ■■ k = 1,n) and (r/k : k = 1,n) 
are the grid spacings. Here rji = ■ ■ ■ = 77200 = 0.01. The covariance matrix (14) is shown in Fig. 


left panel, where asymmetry is clearly present. Finally, the data Zi and Z 2 in (15) were generated 
by adding independent, mean-zero Gaussian measurement errors with variances a^ = a^ = 0.25 to 
Yi and Y 2 at given locations and respectively. Here, and n [0,1], so 

that Yi is observed only for s > 0. 


We used the cokriging equation (16) to obtain Yi = {Yi(so) : sq S D^}'^ based on the simulated 


observations Z\ and Y 2 . We compared Y\ to a kriging predictor Y\ based only on data Yi, where 
Yi = {Yi(so) : So € D^}'^ and Yi(so) = chiCn + ahlmi )~^We also compared Yi to a 
misspecified cokriging predictor Y^, where a misspecified symmetric model with A = 0 is substituted 
into (19) and hence into ([^ . Since the misspecification is in the interaction function, their parameters 
A and r, with A = 0, were re-estimated by maximum likelihood based on Zi and Y 2 . As seen in 
Fig. right panel, the cokriging predictor Yi is representative of the true process Yi, even where it 
is not observed. However, the kriging predictor Yi can only shrink to the mean, E{Yi{-)} = 0, in 
spatial regions where there are no observations; and Y ^, which is based on a misspecified symmetric 
model, is clearly a very poor predictor. Gokriging prediction of the dependent variable Y 2 is omitted 
here for the sake of brevity. 


3.3 Deriving classes of cross-covariance functions from marginal covari¬ 
ance functions 

Our conditional approach may also be used to complement the joint approach to constructing mul¬ 


tivariate covariance functions. In particular, Genton & Kleiber (2015) posed an open problem that 
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Figure 1: Cokriging using spatial covariances defined by the conditional approach. Left panel: 


The covariance matrix (14). Right panel, top: The simulated observations Z\ (open circles) and 


(dots). Right panel, bottom: The hidden value Y\ (solid line), the kriging predictor Yi (dashed line), 
the m 
line). 


the misspecified cokriging predictor (dotted line), and the cokriging predictor Yi (dotted-dashed 


seems difficult when using a joint approach: “[GJiven two marginal covariances, what is the valid 
class of possible cross-covariances that still results in a nonnegative-definite structure?”. A straight¬ 
forward answer is available through our conditional approach. The class of cross-covariance functions 
is given by ([^ for any integrable function b{s,v) such that the function C' 2 |i(-, •) obtained from (|^ 
is nonnegative-definite. This is potentially a very rich class of cross-covariance functions, and an¬ 
swering the question reduces to verifying which choice of &(•,•) in 0 yields a nonnegative-dehnite 
C2|i(’) •)■ 

For example, consider the stationary case in D = where we have stationary covariance 
functions C'ii(h),C' 2 |i(h), and interaction function b{s,v) = bo{v — s). Then from (|^, 

C2\i{h) = C22{h) - [ [ boiv)boiw)Cii{h - V+ w)dvdw. 

Let a; S denote spatial frequency, and let rii(a;),F 22 (w), and Bo{u}) be the Fourier transforms 
of Cii{h),C 22 {h), and bo{h), respectively. Then, for C' 2 |i(h) to be a valid covariance function, it 
is required that T 22 {<^) — ??o(w)i?o(—w)rii(w) be nonnegative and integrable over w S (Cressie 


& Huang 1999 Gneiting, 2002). The nonnegativity is trivial if rii(a;) = 0, hence consider those 
a; S H for which 

Bo{u!)Bo{—oj) < r22(a;)/rii(a;), (20) 

where rii(a;) > 0. Recall that Cii{h) and C 22 {h) are covariance functions and hence, necessarily, 
Fii(w) > 0 and r 22 (w) > 0. Further, Bo{uj)Bo{—uj) > 0, trivially. 


Any i?o(-) that satisfies (20) gives the required result, because finiteness follows from J F 22 (w)da; < 
oo being an upperbound on the integral, f F 22 (w) — Bo(uj)Bo(—u)rii(uj) dw. Notice that rii(-) and 
F 22 (-) are Fourier transforms of any pair of stationary covariance functions, and that the squared 


modulus of Bo has only to stay below the envelope given by the right-hand-side of (20). From our 


conditional approach, we see that there are many solutions to Genton and Keiber’s open problem. 
Appendix 1 shows how to obtain a class of valid Matern cross-covariance functions developed by 


Gneiting et al. 

to 

o 

o 

) that satisfies ( 
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4 Multivariate spatial models through conditioning 
4.1 Definition of cross-covariance functions 

In this section, we extend our conditional approach from the bivariate to the multivariate case. 
Initially, we work with the variables in their original ordering and subsequently show how directed 
graphical models introduce parsimony into the conditional approach. Now, • • ■ jlp(’)] can be 

decomposed as 


[yp(-) I X I X ••• X [yfi-)]. 


( 21 ) 


First, we set cov{Yi(s), li('u)} = < 711 ( 3 , it); s,u G D. Analogous to the bivariate case p = 2, we 
define the first two conditional moments of yq(-), for q = 1,... ,p, as 

9-1 » 

E[Yq{s) \ {Yr.{-) : r = 1,... ,{q-1)}]='^ bqr{s,v)Yr{v)dv, sGD, (22) 

.-1 Jd 


cov[yq(s),r<;(M) I {Yr{-) : r = 1,..., {q - 1)}] = (7q|(^<q)(s, u), s,u G D, 


(23) 


where {bqri-, •) : r = 1,..., (g — 1); q = 2,... ,p} are integrable functions that give the conditional 
relationship of the rth process on the qth. process, for r < q. 

As a result of the decomposition in (21), we obtain from (22) and (23) the following expression 
for the marginal covariance functions. For q = 1,... ,p. 


Cqq(s,u) = COv{Yq(s),Yq(u)} 

g-1 g-1 


= V' V' / / bqris,v)Crr'(v,w)bqr'(u,w)dvdw + Cql(^r-<q)(s,u)- (24) 

r=lr' = l-^DJD 


Once again, we see that the covariation, here given by (24), is decomposed into an explanatory 
component and a descriptive component. 

For r = 1,..., q — 1, the cross-covariance functions are 


(7rg(s, u) = COv{lX.(s),yq(lt)} = ^ / / bqr>(u,w)Crr'(s,w)d 


(25) 


and Cqr{s,u) = Crq{u,s). Expressions (24) and (25) depend on C^r', for r, r' < q, which are defined 
iteratively: Starting with q = 2, C 22 , <7i2, and C 21 depend on Cu and C 2 \i. The same idea is 
repeated for g = 3,... ,p. 


4.2 Existence of a p-variate process 

Following the discussion in Section[3.1[ the existence of a p-variate Gaussian process with covariance 


and cross-covariance functions given by (24) and (25) follows by showing that 


var' 


EE ^qkYqi^Sqf^^ / ^ 0; 

,g=lfc=l J 


(26) 


for any real numbers {uqk : fc = 1,..., Uq; q = 1 ,... ,p}, any nonnegative integers {n, : q = 1,... ,p} 
such that ni -I- • • • -I- Up > 0, and any {sqk ■ k = 1,... ,nq] q = 1,... ,p}. In Appendix 2, we 
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demonstrate that (261 is equal to 


rip Up p-ip-i 

^ ^ ^ ^ (g<p) (^pm5 ^ ^ / / Cf/q(s)cf/p(u)Cqp(s^ u)(isdu^ 

m=lm' = l q=lr=l'^^'^^ 


where 


/(s) = S X! + 


.fc=i 


m=l 


^pm^pqi^Spm-) '^) 


(28) 


The nonnegativity of the first term in (271 follows by assumption, and the nonnegativity of the 
second term follows by induction; see Appendix 2. 

This result implies that a multivariate spatial Gaussian model constructed using the condi¬ 
tional approach (221 and (23) exists, provided that the univariate covariance functions C'ii(-,-) 
and {Gq|(j.<g)(-, •) : q = 2,...,p} are valid and that the interaction functions {bqri-,-) ■ r = 
1,... ,q — 1; g = 2,... ,p} are integrable, which are mild restrictions. Moreover, these functions 
can be specified completely independently of one another. 


4.3 Joint distributions implied by a network 

Ambient air pollution can cause health problems but not the other way around. Both variables 
exhibit spatio-temporal variabilities, however data are not available to track all of the parcels of air 
and individuals interacting in a space-time cube. Integrating these two spatio-temporal processes 
over time, results in a bivariate spatial process. Is a causal relationship still present? 

Suppose that ao(h; r) is a space-time interaction function, where h and r denote spatial and 
temporal separation respectively. Importantly, assume ao{h;T) is zero for r < 0. We consider the 
mean-zero case and express > 2 ( 5 ; t) as a causative space-time convolution involving Yi {-; •), 


y2{s-,t) 




- T)ao{s 


— v; r)dudr -|- dV (s; t), 


(29) 


where we let dl^ be a mean-zero, Gaussian, temporally uncorrelated process that satisfies vaT{dV(s; t)} 
= 4|t|dt and dt is an infinitesimal interval at t. At each time point t, dV (s; t) is assumed to be spa¬ 
tially correlated in a manner invariant with t. A simple example of such a process is one that is 
space-time separable, where the temporal component is 2\t\^/'^dW{t) for W{-) a Wiener process. 
Interchanging the order of integration, the time-integrated process is 

i™ oT f ^ 2 (s;t)dt= lim ^ [ [ [ Yi{v;t - T)ao{s - v]T)dTdvdt + ^{s), 

T->oo 2T T^oo 2T J_tJdJ-oo 


where it can be shown that the spatial covariance function of ^(s) is identical to that of dF(s;t); 
see Da Prato & Zabczyk (2014 Section 4.2) for a formal treatment. The inner integrand of the first 


term on the right-hand side is a convolution that is a function of t. Applying Fubini’s theorem to 


the convolution (e.g., Wheeden 2015 Ghapter 6), we obtain 


^ 2 (s)= [ Yi{v)bo{s - v)dv + ^{s), 
J D 


(30) 


where Yq{s) = limT->.oo(2r) ^ Yq(s; t)dt, bo(s — v) = ao(s — v; t)dt, and where 

one must ensure that ao{s — v]t)dt < 00 , for all s,u. Clearly, the spatio-temporal interaction 


function ao(-; •) is not identifiable from bo{-), but the causative structure in (29) implies a causative 


relationship in the spatial domain D, from Di to 1^2 through 6o(')- Comparing (30) to the bivariate 
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model in Sectionj^ we can identify cov{^(s),^(u)} with ( 7211 ( 5 ,m). 


We continue the discretization by tesselating D into small finite elements. Then (301 can be 


written as F 2 = BYi + where the elements of the matrix B are defined by discretizing the 
interaction function. This bivariate model can be represented as a simple directed acyclic graph 
with Yi as the parent node and Y 2 as the child node. It is straightforward to see that assumptions 


similar to (29) about the spatio-temporal dependence for p (> 2) variables, will engender a directed 


acyclic graph. This has become an important approach used in multivariate statistical modelling 


(e.g., Cox & Wermuth, 1996), and our research in this paper shows how it generalizes to multivariate 


spatial statistical modelling. The directed acyclic graph structure is equivalent to a partial order on 
the nodes (e.g., Cressie & Davidson 1998). Then ( [Bishop 2006 p. 362); 


[yi,...,yp] = nK|rp,(,)][y«], 

qGR 


(31) 


where Yr is set of spatial processes whose indices are given by all the root nodes, R are all the nodes 
with parents, and pa{q) are all the parent nodes that have a directed edge to node q. 

When there is causative structure between the p variables, expressed through a directed acyclic 
graph, (31) shows that the p\ possible multivariate models reduces to just one. The modeller then 
needs to specify and fit the interaction functions, {bq^pa{q)i-, ■) ■ q S R}, and the multivariate 
marginal distribution [Yr]. The special case of a rooted tree, common in multiresolutional spatial 


models, leaves just one spatial process, say [Yi], to model marginally (e.g., Ver Hoef & Barry 1998 


Huang et al. 2002). If there are feedback loops in the space-time cube considered earlier, temporal 


aggregation will result in undirected edges between the relevant variables. For these edges, a choice 
of direction that results in a directed acyclic graph results in a multivariate model. The fewer 
edges there are that are undirected, the smaller the number of possible multivariate models to fit 
via the conditional approach. Of course, it is possible to combine nodes of the network until all 
remaining edges are directed. In that case, [yQ|pa(Q)] is a |(5|-variate conditional model, where Q is 
the combined node consisting of |(5| spatial variables. 

An undirected edge may be due to directed edges from a missing node in the network; for 
example, exposure to cigarette smoke was a variable missing from the two-node network of jJin et al.j 
(2005), where lung cancer was modelled conditional on esophagus cancer. Without the presence of 
the third node, a full bivariate modelling approach may seem more appropriate than a conditional 
approach. Alternatively, an edge may be undirected because the causative mechanism is not yet well 
understood, and the conditional approach will shed light on this. In this case, both directions can 
be tried, and a model-selection criterion, such as cross-validation, the Akaike information criterion, 
or the deviance information criterion, would indicate the appropriate direction of the edge. In 
Section 5.3 we illustrate a case where the directed edge from the temperature variable to the 
pressure variable is unequivocal. Model selection in this framework amounts to establishing the 
causative links in the network (e.g., Lauritzen 1996 Kolaczyk 2009). 

Finally, there are other, more direct ways that could guide the choice of edges in the network of 
spatial variables. Generally speaking, although not necessarily, Y 2 (-) will be a smoother process than 
Yi(-), due to the integral in ([^. Hence, a Matern model could be fitted to each individual spatial 
process and an ordering of the fitted Matern smoothness parameters could be used to indicate the 
directed edges. A similar problem involving choice of edges was faced by time-series analysts, where 
stationarity was assumed and the dependence was captured through a spectral and cross-spectral 
representation of the process’ covariance and cross-covariance functions. Dahlaus (2000) developed 
a series of hypothesis tests in spectral space to determine undirected edges in a network of temporal 
processes. 
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5 Analyzing a temperatnre-pressnre dataset 
5.1 The data 

We demonstrate the flexibility of the conditional approach on a temperature-pressure dataset used 


in 

Gneiting et al. 

(2010 

1 and Apanasovich et al. 

(2012). The data, which are available with the R 

package RandomFields ( 

Schlather et al. 

20151, are on the error helds, namely the difference between 


temperature and pressure two-day forecasts and the respective observations from monitoring stations 
in the Pacific Northwest of North America on December 18, 2003 at 4 p.m. Since the observations 
are colocated, mi = m 2 = m = 157, and Df = . Both pressure and temperature forecasts 

are spatially smooth, although observations of temperature tend to be more variable than those of 
pressure. The smoothing action of the interaction function in Q can capture this, which implies 
that we should condition on the temperature field. See below, where we diagnosed the suitability of 
this choice by swapping the roles of temperature and pressure in ([^. 


5.2 The processes and their bivariate models 


Here we discuss the spatial processes involved with temperature; a discussion of pressure follows 
likewise. There is a latent temperature process T(-) for which we have observations, Oi(si) = 
T{si) ei^o(si), and forecasts, T’i(si) = T{si) + ei^F{si) (i = 1,...,157), where ei_o(sz) and 
ei.p’(si) are the observation and forecast errors, respectively. Then, the data are Zi{si) = li(si) = 
Fi{si) — Oi{si) = ei,F(si) — ei^oisi) (f = 1,..., 157). Notice that the process Ti(-) itself is defined 
in terms of observations, so we analyze the problem assuming that the temperature data Zi and the 
process Yi at their respective locations are the same. We do likewise for pressure, resulting in data 
Z 2 and the process Y 2 the same as Y 2 . 

For both variables, the data are incomplete and hence cokriging is needed to map the respective 
fields. Specifically, for any sq G D, our goal is to use cokriging to predict Yi(so) and ^ 2 ( 50 ) and 
to compute their prediction standard errors. In what follows, a number of bivariate spatial models 
based on the conditional approach are htted and their performances compared to Matern-type models 


fitted by Gneiting et al. (2010). 


In the conditional approach given by §-( 0 , we need to specify the univariate covariance func¬ 
tions, Cii{s,u) and C 2 |i(s,w), and the integrable interaction function b{s,v). We let the covariance 
functions be isotropic Matern covariance functions given by 0 and ( [T^ . Further, we let b{s,v) be 
a function of displacement, h = v — s, so that bo{h) = b{s,v). The four different models htted are 
written as: 


Model 1 (independent processes): 
Model 2 (pointwise dependence): 

Model 3 (diffused dependence): 
Model 4 (asymmetric dependence): 


bo{h) = 0 , 


bo{h) = A6{h), 



A{i-m\/rrv, 

0 , 

A{I-(||h-A||/r)n^ 

0 , 


11*11 < A 

otherwise, 

\\h- A|| < r, 
otherwise. 


where 6o(') in Models 3 and 4 is a bisquare and shifted bisquare function, respectively. The introduc¬ 
tion of the asymmetric parameter A = (Ai, A 2 )'^ in Model 4 is analogous to applying the shifting 
method of Ver Hoef & Cressie (1993), Christensen & Amemiya (2001 1 , and Li & Zhang (2011) to the 
diffused symmetric dependence in Model 3. We explored whether asymmetry might be present in 
the data by hrst interpolating the temperature and pressure error helds onto a regular grid, and then 
plotting the correlation between the two gridded helds as a function of the displacement vector h of 
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Figure 2: Spatial domain and correlation functions. Left panel: State boundaries and province 
boundaries of a region of the USA and Canada (dark solid lines), with the domain of interest 
enclosed by a bounding polygon (dashed line). The irregular triangular grid used for discretizing D 
(light solid lines) and the observation locations given by (dots) are also shown. The discretized 
spatial domain consists of the vertices of the triangular grid. Right panel: The correlation and 
cross-correlation functions estimated using Model 4, depicted as a function of displacement h, in 
degrees longitude/latitude, at the location s = (—123°,45°). Contour lines of correlation are in 
intervals of 0.2. 


the temperature field. The contour plot showed a clear negative dip in the bottom-right quadrant, 
indicating asymmetry. In contrast, the bivariate spatial models fitted to these data by [Gneitin^ 


et al. (2010) and Apanasovich et al. (2012) are symmetric. 


We discretized both yi(-) and U 2 (-) onto a triangulated grid using the mesher available with the 
R package INLA available from www.r-inla.org. The resulting irregular spatial lattice had ni = n 2 = 
n = 2063 vertices each. Here, these n vertices define D^; see Fig. left panel. Under the chosen 
triangulation, the integral in is approximated as E{Y 2 {si) \ Ui(-)} ^ Ylk=i'^kb{sh'>Jk)Yi{vk), 
where in this case {rjk : fc = 1,..., n} are the areas of the small Voronoi polygons constructed from 


the triangulated grid (e.g.. 

Lee & Schachter 

1980 

). In order to ensure nonnegative-definiteness of 

Cii and C' 2 |i, we follow 

Gneiting et al. 

(2010 

and use chordal distances to establish the covariances 


between two points on the sphere. This embeds Earth’s surface into K°, where univariate covariance 
functions are readily available. The interaction function has no such constraint, so we capture 
asymmetry in the interaction function bo{-) directly in the longitude-latitude space and carry out 
the numerical integration there. 


5.3 Estimation and prediction 

From (fT3, the covariance matrix of the bivariate spatial process is 


cov((y-,r-)-) = 


Ell + U In 
BY, 


^2|l 


EiiH^ 

HEiiH^ 


r| J„ 


(32) 


which is a 4126 x 4126 matrix. The terms and are due to micro-scale effects, which 


we add to make our model comparable with that of Gneiting et al. (2010). Maximum likelihood 
estimation took on the order of 1 minute for Models 1 and 2, and on the order of 1 hour for Models 3 
and 4. Computational requirements when numerical integration is required can be reduced by using 
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Table 1: Parameter estimates for Models 1-4, 
present in the model. 



n 

T2 

0-11 

02\1 

Kll 

Model 1 

0.00 

68.47 

2.60 

275.34 

0.011 

Model 2 

0.00 

67.78 

2.60 

242.04 

0.011 

Model 3 

0.00 

70.16 

2.68 

243.77 

0.011 

Model 4 

0.01 

69.79 

3.02 

199.86 

0.007 


Blank entries indicate that the parameter is not 


'^2|l 

1^11 

^2\l 

A 

r 

0.010 

0.60 

1.56 



0.011 

0.60 

1.58 

-14.30 


0.010 

0.61 

1.84 

-40.83 

1.46 

0.004 

0.56 

1.24 

-65.58 

1.18 


Table 2: Log-likelihood (Log-lik.), Akaike information criterion (AIC) for Models 1-4, the parsimo¬ 
nious Matern model, the shifted parsimonious Materu model, and the full Matern model 



No. of parameters 

Log-lik. 

AIC 

Model 1 

8 

-1276.77 

2569.54 

Model 2 

9 

-1269.92 

2557.84 

Model 3 

10 

-1264.90 

2549.80 

Model 4 

12 

-1258.21 

2540.43 

Parsimonious Matern 

8 

-1265.76 

2547.52 

Shifted parsimonious Matern 

10 

-1260.87 

2541.75 

Full Matern 

11 

-1265.53 

2553.06 


a covariance function Cn that can be evaluated rapidly on a fine grid, or that has compact support 
(Furrer et al., 2012). 

The maximum likelihood estimates of the parameters for the four different interaction functions 
are given in Table Notice that some of the estimates change considerably between model spec¬ 
ifications. For example, the scale parameter CT 2 |i decreases from 275.34 in the independent model 
to 199.86 in the asymmetric-dependence model, which illustrates how some of the variability in the 
pressure error field is accounted for by conditioning on the temperature error field. The estimate of 
the interaction parameter A is also seen to become steadily more negative from Model 2 to Model 
4, implying that the interaction function is most influential when it is allowed to have both a scale 
and an asymmetry term. 

Since out-of-sample spatial prediction is a principal use of multivariate spatial models, we used 
the Akaike information criterion and cross-validation to assess model performance (Stone 19771. As 
seen in Table the Akaike information criterion decreases steadily from Model 1 with 8 parameters 
(2569.54) to the lowest at Model 4 with 12 parameters (2540.43). The symmetric Matern models of 


Gneiting et al. (2010 Table 3) performed worse than Model 4, while the parsimonious Matern model 


gave a similar Akaike information criterion to Model 3. The shifted parsimonious Matern model, 
constructed by applying the method of Li & Zhang (2011) to the parsimonious Matern model, gave 
a similar Akaike information criterion to Model 4. These results were expected due to the similarity 
between Model 3 and the parsimonious Matern model, which is found in Appendix 1, and due to 
the analogy between the approach of Li & Zhang (2011) and our inclusion of A in Model 4. Overall, 
these results suggest that allowing for asymmetries in the model is more important in this problem 
than incorporating smoothness and/or scale parameters in the cross-dependencies. Correlation and 
cross-correlation functions estimated from Model 4 are shown in Figure right panel. 

For our cross-validation analysis we left out a single location and found the predictive distribution 
of both fields at the left-out location using parameters estimated from all the data. In Table which 
is found in Appendix 3, we list the maximum absolute error, the root-mean-squared prediction error, 
and the mean continuous-ranked probability score from our cross-validation study. As well as giving 
results for Models 1-4, we also include those obtained using the parsimonious Matern and the 
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full bivariate Matern models with the RaudomFields package (Schlather et al. 2015) and those 


from the parsimonious Matern model. Within the models we propose, Model 4 and the shifted 
parsimonious Matern model outperformed the others on nearly all cross-validation diagnostics for 
both the pressure and temperature error fields. When compared to the symmetric parsimonious 
and full bivariate Matern models, the asymmetric models offer considerable improvement in the 
prediction performance of both error fields. 

In Fig. left panels, we show the cokriged temperature and pressure error fields under Model 4 
using the entire dataset that includes both Z\ and Z^. Notice how the temperature error field is con¬ 
siderably rougher than the pressure error field. In Fig. right panels, we illustrate for temperature 
the difference between the cokriging standard errors based on Model 1 and those based on Model 
4. The spatial pattern of the standard errors is a clear consequence of the asymmetric covariance 
function: comparing Models 1 and 4, we see that Model 4 tends to have lower standard errors in 
regions that are south-east of the observation locations, which is due to Model 4’s asymmetry. 

Finally, we re-did all the experiments for the same models described above, but now with Yi 
as pressure error and Y 2 as temperature error. With this reversed conditioning, the Akaike infor¬ 
mation criteria for Models 1-3 did not change substantially, however that for Model 4 worsened 
from 2540.43 to 2560.97. Further, the analogous leave-one-out cross-validation diagnostics showed 
that the reversed modelling of temperature error given pressure error in Model 4 resulted in worse 
predictive performance in the respective entries of Table with regard to both temperature and 
pressure. Clearly, the direction of dependence plays a central role here in model performance and 
our conditional approach has allowed us to propose a preferred direction. We discuss this in greater 
detail in Section KSl 


6 Discussion 


The conditional approach can be modified easily for different spatial domains. Consider {Fi(s) : s € 
Hi} and { 12 ( 5 ) : s G H 2 }, for Hi, £>2 C then becomes. 


B(Y2(s) I yi(-)) = f b{s,v)Yi{v)dv; 

JDi 


s e H 2 . 


For example, Cressie & Wikle (2011 p. 287) illustrate bivariate spatial dependence between Mallard 
breeding bird pairs in the Prairie Pothole region of North America and the El Nino phenomenon in 
the tropical Pacihc Ocean, for which the conditional approach could be used. 

In Sectionj^ we estimated the parameters appearing both in C'ii(h), C 2 \i{h), and in the inter¬ 
action function b{s,v). In some cases, 6(s,u) can be given by the underlying science. One such case 


is atmospheric trace-gas inversion (Zammit-Mangion et al. 2015aI, in which a non-Gaussian flux 


field Yi is estimated from the mole-fraction field Y 2 , observed at isolated locations. The interaction 
function 6(s, v) was obtained directly from a transport model driven by weather forecasts and hence 


was assumed known (e.g., Ganesan et al. 2014). 

Even if the parameters are known or estimated off-line, spatial or spatio-temporal inference with 
multivariate models can remain computationally challenging. When treating all variates simultane¬ 
ously in joint form, sparse formulations and sparse linear-algebraic methods can greatly facilitate 


the computation (e.g., Zammit-Mangion et al., 2015b I. Sparseness is guided by the graphical repre¬ 
sentations, which are discussed in Section [4.3| By constructing multivariate spatial models through 
conditioning, the accompanying graphical representations allow exact inference through sequential 
algorithms. Markov chains of spatial processes, such as autoregressive spatio-temporal processes, can 


be tackled with the iterative Rauch-Tung-Striebel smoother (e.g., Rauch et al. 1965). Eor more gen- 
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Figure 3: Cokriging predictions in the discretized spatial domain. Left panels: The cokriged surface 
using maximum likelihood estimates for the parameters with Model 4 for the temperature and 
pressure error fields. Top-right panel: A scatter plot of the cokriging prediction standard errors of 
Yi obtained with Model 4 against those obtained with Model 1 at each of the mesh vertices. The 
colour illustrates the difference between the two, with green denoting the higher standard error of 
Model 4 and purple denoting the higher standard error of Model 1. Bottom-right panel: A spatial 
plot of the difference in the prediction standard errors of Yi obtained with Model 4 and Model 1, 
with green denoting a higher standard error of Model 4 and purple denoting a higher standard error 
of Model 1. 


eral constructions, such as trees or polytrees, the sum-product or peeling algorithm may be used for 
exact inference. When likelihoods associated with some or all of the processes in {Yg : q = 1,... ,p} 
are intractable, approximate message passing may be used to keep the computations tractable (e.g., 
, such as when the data model for Zq{-) is a spatial Poisson point process 
and Yq{-) is the log-intensity of the process. 

Reproducible code and data are available from https://github.com/andrewzm/bicon. 
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Appendix 1 

A class of Matern cross-covariance functions consistent with marginal 
Matern covariance functions 

Let Cii{h),C 22 ih), and bo(h) be isotropic Matern covariance functions on and, for simplicity, 
assume that they all have the same scale n. Then, using obvious notation, their Fourier transforms 
are 

BoM = + ll“f 

r^fw) = 1— —L—+ ||tj||^) "" w e i = 1,2, 

For C' 2 i(-) and Ci 2 {-) to be valid cross-covariance functions, it is required that 

r22(w) — Bo{u!)Bo{—U!)Tii{(jj) > 0, 


and hence that 






1 


cr?i 


.22«-- (^2 


ikir) 


2\2+2i/fa+i/'i 1—1^22 


It can be easily shown that the inequalities, 

Vb > {^22 — vii — 2)/2, 

^22 1 


Cfe < 27r 


Cll V22 — Vix — 2 \Vii 


V22 


(33) 

(34) 

(35) 


are sufficient for (331 to hold. Then, from ([^, Ci 2 (h) is also a Matern covariance function with 

1 Vb^ll 2 2 


variance 


^12 


7TK? Ub + z/ii + 1 


2 2 


and smoothness 1^12 = Vb + vu -|- 1. Hence, from (34|, 1212 > (i^ii + i^22)/2. 

Now consider the bound on the smoothness, 1212 = ( 1^11 + J^22)/2, which is obtained from the 


bound, Vb = {v 22 — ~ 2)/2, in (34). An inequality for the variance (T 22 is then obtained by 

substituting this value of Vb and the inequality (35) into (36): al 2 < 2criitT22(j^iii^22)^^^(j^ii+ ^^ 22 )”^- 


The conditions on 1^12 and ai 2 are those that Gneiting et al. (2010) impose in order to construct 


parsimonious bivariate Matern models. Clearly, these are more restrictive than our conditions (34) 


and (35). 


Generalizing these ideas to arbitrary scale parameters kh, K 22 , ^t,, as in Gneiting et al. (2010) 


could be done, but it is more fruitful to give up the assumption that the interaction function is a 
Matern symmetric nonnegative-definite covariance function; recall that it only needs to be integrable. 


Appendix 2 

Proof of existence of the multivariate process 

Here, we prove by induction that ( p^ holds for for any real numbers {aqk ■ k = 1,... ,nq;q = 
l,...,p}, any nonnegative integers {uq : q = l,...,p} such that ni -I- • • • -I-rzp > 0, and any 
{sqk : k = l,...,nq] q = We have already shown, through ([^, that there exists a 

bivariate stochastic process, and hence the variance of any linear combination of the two processes 
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is nonnegative. Now, assume that {yL(-)) • ■ • ^ defined {p — l)-variate stochastic 

process. We re-write (26) as: 


var 


P—1 'IT'q 

''^'^^qkYqiSqk) + ^ 
q—1 k—1 m—1 



Then, following the definitions for the marginal and cross-covariances in (241 and (25) and using 
standard identities, we obtain the following expression for (26): 


m—1 m' — l 


CLpmUpm' C'p|(9<p)( ^pm 5 ^pm' ) 


-b 

p—Ip—1 rip rip 

E/ E/ E/ E/ 

q—l r—1 m—1 m' — l 

j f bpqi 

J D J D 

w)bpY‘(^Spm‘ 


p—1 p—1 Tlq rip 

r 


-b 

EEE E ^qkapm' 

1 bpri^^pm' • 

:W)Cqr{Sqk,w)dw 


q—l r—1 k—1 m' — l 

' D 



p—Ip—1 riq rip 

p 


-b 

EE E E ^qk'apm 

1 ^pg(^pm5 

v)Cqr{v, Srk’)dv 


q—l r—1 k' — l m—1 

' D 



p—Ip—1 riq Ur 



-b 

^ ^ ^ ^ ^ ^ ^ ^ dqkdrk'Cqr{Sqk-, Srk'')-. 

) 


q—l r—1 k—1 k' — l 

which can be simplified to 


w)dvdw 


Tip Tip 


EE dpm^pm' C'p|(9<p)( ^pm 1 ^pm ') 

m—1 m' — l 

P-1 P-1 /• /* f 

+ EE// E ttqk^is — Sqk) + E dpm^pqi^^pm') ^') 

^—1 ^—1 J D J D \ 1. ^ ^ — 1 


(37) 


q—l r=l 


.fc=l 


X < ^ ark'5{u- Srk') ^ dpm'hpr{Spm',u) > (s, u)d5dw. 




m' — l 


Expression (37) can be further reduced to 


p—1p—1 


■-p --p t' ^ p p p 

^ ^ ^ ^ dpmdpm'Cp^(^q^p^{^SpjYi^ SpjYi'^ E ^ ^ / / Q-g (w)(7g7'(s, lA)dsdz/-, 

m=lm' = l q=lr=l’^^’^^ 


and this is (27). The first term in ( |38[ ) is nonnegative by assumption, while the second term is 
nonnegative since {Ti(-),..., Y^_i(-)}'^ is a well-defined (p — l)-variate process. 


18 








Appendix 3 

Leave-one-out cross-validation diagnostics 


Table 3: Leave-one-out cross-validation prediction diagnostics: mean absolute error (MAE), root- 
mean-squared prediction error (RMSPE), and mean continuous-ranked probability score (MCRPS). 


Process 

Model 

MAE 

RMSPE 

MCRPS 


Model 1 

69.56 

123.36 

55.33 


Model 2 

70.19 

124.4 

55.64 


Model 3 

70.32 

123.0 

55.19 

Pressure (Pa) 

Model 4 

66.07 

114.7 

51.73 


Parsimonious Matern 

70.15 

123.0 

55.35 


Shifted parsimonious Matern 

67.01 

115.0 

52.48 


Full Matern 

66.19 

122.8 

55.23 


Model 1 

1.14 

1.63 

0.81 


Model 2 

1.14 

1.63 

0.81 


Model 3 

1.10 

1.53 

0.78 

Temperature (°C) 

Model 4 

1.08 

1.47 

0.77 


Parsimonious Matern 

1.11 

1.56 

0.79 


Shifted parsimonious Matern 

1.09 

1.48 

0.77 


Full Matern 

1.11 

1.58 

0.79 
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